Finite-state Relations Between Two Historically Closely Related Languages
نویسنده
چکیده
Regular correspondences between historically related languages can be modelled using finitestate transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a protolanguage) is established between two related languages. This representation, AFE (Aligned Finnish-Estonian) is based on the letter by letter alignment of the two languages and uses mechanically constructed morphophonemes which represent the corresponding characters. By describing the constraints of this AFE using two-level rules, one may construct useful mappings between the languages. In this way, the badly ambiguous FSTs from Finnish and Estonian to AFE can be composed into a practically unambiguous transducer from Finnish to Estonian. The inverse mapping from Estonian to Finnish is mildly ambiguous. Steps according to the proposed method could be repeated as such with dialectal or older written texts. Choosing a set of model words, aligning them, recording the mechanical correspondences and designing rules for the constraints could be done with a limited effort. For the purposes of indexing and searching, the mild ambiguity may be tolerable as such. The ambiguity can be further reduced by composing the resulting FST with a speller or morphological analyser of the standard language.
منابع مشابه
Analysis of Language Legislation of All 85 Russian Federation’s Subjects (Regions)
The analysis of the language legislation of all 85 subjects of the Russian Federation shows complete heterogeneity and diversity. Common legal guidelines in Federal law do not exist, because Federal legislation is obsolete and is largely whitespace and conflict. The subjects of the Russian Federation, on whose territory different ethnic groups, both large and indigenous, historically live, solv...
متن کاملA Machine Translation System Between a Pair of Closely Related Languages
Machine translation between closely related languages is easier than between language pairs that are not related with each other. Having many parts of their grammars and vocabularies in common reduces the amount of effort needed to develop a translation system between related languages. A translation system that makes a morphological analysis supported by simpler translation rules and context d...
متن کاملOn the Graphs Related to Green Relations of Finite Semigroups
In this paper we develop an analog of the notion of the con- jugacy graph of nite groups for the nite semigroups by considering the Green relations of a nite semigroup. More precisely, by de ning the new graphs $Gamma_{L}(S)$, $Gamma_{H}(S)$, $Gamma_{J}(S)$ and $Gamma_{D}(S)$ (we name them the Green graphs) related to the Green relations L R J H and D of a nite semigroup S , we first atte...
متن کاملInternship report - Streaming String Transducers
In formal language theory, two very different models sometimes turn out to describe the same class of languages. This usually shows that there is a fundamental concept described by those models. A well-known example is the class of regular languages, which can be characterized by logic (monadic second order (MSO) logic), algebra (syntactic monoids), and many computational models (automata). In ...
متن کاملUsing Mazurkiewicz Trace Languages for Partition-Based Morphology
Partition-based morphology is an approach of finite-state morphology where a grammar describes a special kind of regular relations, which split all the strings of a given tuple into the same number of substrings. They are compiled in finite-state machines. In this paper, we address the question of merging grammars using different partitionings into a single finite-state machine. A morphological...
متن کامل